There is a newer version of the record available.

Published November 11, 2021 | Version v1
Dataset Open

OC-782K: Knowledge Graph of "Scientometrics" modelled according to the OpenCitations Data Model

Description

This dataset is a knowledge graph extracted from a triplestore covering information about the journal Scientometrics and modelled according to the OpenCitations Data Model. The original triplestore is available here. This KG was extracted for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. The file and_eval.json contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/open-citations.

Files

OC-782K.zip

Files (230.7 MB)

Name Size Download all
md5:fe4df744ee5f00f97670fb1893ba5466
230.7 MB Preview Download